Libraries
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.4 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggExtra)
library(ggplot2)
source('TeachBayes.r')
library(rjags)
## Loading required package: coda
## Linked to JAGS 4.3.0
## Loaded modules: basemod,bugs
library(ggpubr)
###Aim 2020 marked the first-ever occurence of predicted grades for students in Ireland. The Irish Government had to provide an estimated mark across all subjects for each student in the country. There were many issues with their model.So, Using Bayesian Statistics and a dataset of student grades, we want to build a model that can predict an end of year grade from personal and academic characteristics. Our target population is all future students.
Our objective is to create a model that can accurately predict end of year grades based on student information. We will aim to use variables like Mock Examination Results, Gender, Study Time, Absences, Failures, and Parents Education to predict their end of year grade. Our parameters of interest are G3 (End of semester grade), Sex, StudyTime, Absences, Freetime, and Travel time.
###Subjective Impression
We found a dataset online . The dataset included grades and student information for 1044 students taken from a school in Portugal. The portuguese system measures grades from 0 to 20 and for convenience we have multiplied the grades by 5 to get a grade out of 100.
#Load data from Github
math_student_data_url <- "https://raw.githubusercontent.com/arunk13/MSDA-Assignments/master/IS607Fall2015/Assignment3/student-mat.csv";
math_student_data <- read.table(file = math_student_data_url, header = TRUE, sep = ";");
eng_student_data_url <- "https://raw.githubusercontent.com/arunk13/MSDA-Assignments/master/IS607Fall2015/Assignment3/student-por.csv";
eng_student_data <- read.table(file = eng_student_data_url, header = TRUE, sep = ";")
studentdata <- rbind(math_student_data, eng_student_data)
#Clean Data
studentdata[c("G1","G2","G3")] <- studentdata[c("G1","G2","G3")] * 5
#Estimate Mean Grade
studentdata %>% summarise(N=n(), mean = mean(studentdata$G3), sd = sd(studentdata$G3))
## N mean sd
## 1 1044 56.70977 19.32398
There are N=1044 students in our dataset, The mean final grade is 56.70977 while the standard deviation is 19.32398. With normal data, most of the observations are spread within 3 standard deviations on each side of the mean.
#Defining prior
m0 <- 60
s0 <- 1 #Sigma_0 (Using a larger value for sigma_0 to allow for extra variation in Final Grade)
#Likelihood
xbar <- mean(studentdata$G3)
sigma <- 15
n = length(studentdata$G3)
se <- sigma/sqrt(n)
###Formal Analysis
#Plot the prior, data and posterior
Prior <- c(m0, s0)
Data <- c(xbar, se)
Posterior <- round(normal_update(Prior, Data),3)
x <- seq(0, 100, length=1000)
priorx <- dnorm(x, mean=m0, sd=s0)
datax <- dnorm(x, mean=xbar, sd=se)
postx <- dnorm(x, mean=Posterior[1], sd=Posterior[2])
plot(x, priorx, type='l',lwd=3,xlim = c(50,65),ylim=c(0,1),col = 'blue', main = '', xlab = 'theta', ylab = '')
lines(x, datax,col='black',lwd=3)
lines(x, postx,col='red',lwd=3)
legend("topright", c("Prior","Data","Posterior"), lty = 1, lwd= 3, col = c('blue','black','red'))
The posterior has shifted to the left towards the data. Our prior belief is a normal distribution \(X\sim N(\theta,\sigma^2)\) with mean = 60 and \(\sigma\) = 1. The mode is now approximately 57. We can see the distribution is more precise than both the data and prior, taking advatange of combining information from both.
#Measures of spread
mysims <- rnorm(10000, mean = Posterior[1], sd = Posterior[2])
diff(quantile(mysims, probs = c(0.25,0.75)))
## 75%
## 0.5693389
##Comparing Grades by Sex We’d like to see if there is a difference between grades for male and female students.
studentdata %>% group_by(sex) %>% summarise(mean = mean(G3), n = length(sex), sigma = sd(G3))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 4
## sex mean n sigma
## <chr> <dbl> <int> <dbl>
## 1 F 57.2 591 19.4
## 2 M 56.0 453 19.3
n_F <- sum(studentdata$sex=='F')
n_M <- sum(studentdata$sex=='M')
Summarising the mean grade by sex we can see that females have a mean grade (=57.24) slightly higher than males (=56.02). However, the difference is quite small.
xbar1_F <- studentdata %>% filter(studentdata$sex == 'F') %>%
summarise(mean(G3)) %>% as.numeric()
xbar2_M <- studentdata %>% filter(studentdata$sex == 'M') %>%
summarise(mean(G3)) %>% as.numeric()
Prior <- c(m0, s0)
Grades_Female <- c(xbar1_F, se)
Grades_Male <- c(xbar2_M, se)
Posterior1_F <- normal_update(Prior, Grades_Female)
Posterior2_M <- normal_update(Prior, Grades_Male)
many_normal_plots(list(Prior,Posterior1_F,Posterior2_M)) + theme(legend.position = c(0.75,0.75))
From the plot above, The Blue curve is our prior. The red curve represents the average final grade of male students, which has a mode of 56.7. The green distribution represents the average final grade of female students, which has a mode of 57.7.
#Estimate Mean and SD using rjags
# Take 10000 samples from the theta prior
prior_m <- rnorm(10000, 56.7, 15)
# Take 10000 samples from the sigma prior
prior_s <- runif(10000, 0, 50)
samples <- data.frame(prior_m, prior_s)
grade_model <- "model{
#Likelihood model for X
for(i in 1:length(X)) {
X[i] ~ dnorm(theta, sigma^(-2))
}
# Prior models for theta and sigma
theta ~ dnorm(56.7, 15^(-2))
sigma ~ dunif(0, 50)
}"
grade_jags <- jags.model(textConnection(grade_model), data = list(X = studentdata$G3),
inits = list(.RNG.name = "base::Wichmann-Hill", .RNG.seed = 1989))
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1044
## Unobserved stochastic nodes: 2
## Total graph size: 1054
##
## Initializing model
grade_sim <- update(grade_jags, n.iter = 10000)
# SIMULATE the posterior
n = 20000
grade_sim <- coda.samples(model = grade_jags,
variable.names = c("theta","sigma"),
n.iter= n)
head(grade_sim)
## [[1]]
## Markov Chain Monte Carlo (MCMC) output:
## Start = 11001
## End = 11007
## Thinning interval = 1
## sigma theta
## [1,] 19.11256 56.62505
## [2,] 19.10750 56.02039
## [3,] 19.15923 56.58185
## [4,] 19.45941 57.79044
## [5,] 19.55191 56.88794
## [6,] 19.69422 57.47517
## [7,] 19.24144 56.91508
##
## attr(,"class")
## [1] "mcmc.list"
summary(grade_sim)
##
## Iterations = 11001:31000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 20000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## sigma 19.34 0.4249 0.003005 0.003837
## theta 56.71 0.6041 0.004272 0.004272
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## sigma 18.52 19.05 19.34 19.62 20.20
## theta 55.54 56.31 56.72 57.13 57.89
# PLOT the posterior
plot(grade_sim, density = FALSE)
# PLOT the posterior
plot(grade_sim, trace = FALSE)
# Store the chains in a data frame
grade_chains <- data.frame(sim = 1:n, grade_sim[[1]])
# Check out the head of rent_chains
head(grade_chains)
## sim sigma theta
## 1 1 19.11256 56.62505
## 2 2 19.10750 56.02039
## 3 3 19.15923 56.58185
## 4 4 19.45941 57.79044
## 5 5 19.55191 56.88794
## 6 6 19.69422 57.47517
mean(grade_chains$sigma)
## [1] 19.34183
mean(grade_chains$theta)
## [1] 56.7144
range(grade_chains$theta)
## [1] 53.98904 58.88349
It is desirable to have stability in the trace plot (no patterns), with good mixing (ie.chain traverses the parameter space without getting stuck or missing a section). From our trace plots for Sigma and Theta we can see they have no patterns and there is good mixing, this indicates that they are stable. The density plots are also normally distributed. From the summary function we see that standard error of the mean of the MCMC chain is 0.004. It is desirable to have a high ratio of meean to SE and that is what we have. The mean grade is appoximately between 54.8 and 58.5, Whereas the standard deviation density follows roughly the same uniform distribution as before
#Running multiple chains
# COMPILE the model
grade_jags_multi <- jags.model(textConnection(grade_model),
data = list(X = studentdata$G3),
n.chains = 4)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1044
## Unobserved stochastic nodes: 2
## Total graph size: 1054
##
## Initializing model
# UPDATE the model
grade_sim_multi <- update(grade_jags_multi,
n.iter = 10000)
# SIMULATE the posterior
grade_sim_multi <- coda.samples(model = grade_jags_multi,
variable.names = c("theta", "sigma"),
n.iter = 20000)
head(grade_sim_multi)
## [[1]]
## Markov Chain Monte Carlo (MCMC) output:
## Start = 11001
## End = 11007
## Thinning interval = 1
## sigma theta
## [1,] 19.46788 56.70482
## [2,] 19.49131 57.25645
## [3,] 19.61351 57.09870
## [4,] 19.16593 56.68187
## [5,] 19.26049 55.97881
## [6,] 19.98217 56.91784
## [7,] 19.61263 58.41211
##
## [[2]]
## Markov Chain Monte Carlo (MCMC) output:
## Start = 11001
## End = 11007
## Thinning interval = 1
## sigma theta
## [1,] 19.39087 57.29508
## [2,] 18.85324 56.78939
## [3,] 19.05277 56.76116
## [4,] 18.57001 57.75921
## [5,] 19.04097 57.22628
## [6,] 19.28623 56.82195
## [7,] 19.23650 56.32212
##
## [[3]]
## Markov Chain Monte Carlo (MCMC) output:
## Start = 11001
## End = 11007
## Thinning interval = 1
## sigma theta
## [1,] 19.39708 57.53342
## [2,] 19.18729 56.35886
## [3,] 19.27617 56.56582
## [4,] 19.28493 57.59569
## [5,] 19.37216 57.04807
## [6,] 19.53782 55.98656
## [7,] 20.15229 54.67981
##
## [[4]]
## Markov Chain Monte Carlo (MCMC) output:
## Start = 11001
## End = 11007
## Thinning interval = 1
## sigma theta
## [1,] 18.58743 57.02325
## [2,] 18.69038 56.11065
## [3,] 18.76972 55.84609
## [4,] 19.39916 56.52946
## [5,] 19.19110 56.04533
## [6,] 18.92029 55.64142
## [7,] 20.29983 56.23160
##
## attr(,"class")
## [1] "mcmc.list"
# Construct trace plots of the theta and sigma chains
plot(grade_sim_multi, density = FALSE)
# Construct density plots of the theta and sigma chains
plot(grade_sim_multi, trace = FALSE)
grade_chains_multi <- data.frame(sim = rep(1:20000,4),
chain = c(rep(1,20000),
rep(2,20000),
rep(3,20000),
rep(4,20000)),
rbind(grade_sim_multi[[1]],
grade_sim_multi[[2]],
grade_sim_multi[[3]],
grade_sim_multi[[4]]))
grade_chains_multi %>% filter(sim<1000) %>%
ggplot(aes(x=sim,y=theta,color=as.factor(chain))) + geom_line() +
geom_smooth(aes(color=as.factor(chain)),se=FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
#Gelman-Rubin diagnostic
gelman.diag(grade_sim_multi)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## sigma 1 1
## theta 1 1
##
## Multivariate psrf
##
## 1
gelman.plot(grade_sim_multi)
autocorr.plot(grade_sim)
Here both m and s appear to converge to 1 at around the same rate. This is what we want in this case as it tells us that the chain have converged to the stationary distribution.
There is no correlation between the iterates of \(theta\), while there is minimal correlation for \(sigma\).
ci95 <- quantile(grade_chains_multi$theta , probs = c(0.025 , 0.975))
ggplot(grade_chains_multi , aes(x=theta)) + geom_density() + geom_vline(xintercept = ci95 , col = 'red' , lty = 'dashed')
Here there is a 95% chance that the mean grade is between 55.54 and 57.87.
###Bayesian Regression (Time spent studying and grades)
studytime_model <- "model{
# Likelihood model for Y[i]
for(i in 1:length(Y)) {
m[i] <- a + b * X[i]
Y[i] ~ dnorm(m[i], s^(-2))
}
# Define the a, b, s priors
a ~ dnorm(50, 25^(-2))
b ~ dnorm(2, 1^(-2))
s ~ dunif(0, 30)}"
studytime_jags <- jags.model(textConnection(studytime_model),
data = list(Y = studentdata$G3,
X = studentdata$studytime),
n.chains = 4)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1044
## Unobserved stochastic nodes: 3
## Total graph size: 2109
##
## Initializing model
# BURN IN the model
studytime_sim <- update(studytime_jags, n.iter = 10000)
# SIMULATE the posterior
studytime_sim <- coda.samples(model=studytime_jags,
variable.names=c("a","b","s"),
n.iter=20000,
thin = 10)
plot(studytime_sim[[1]][,1], main = 'a')
plot(studytime_sim[[1]][,2], main = 'b')
plot(studytime_sim[[1]][,3], main = 's')
gelman.diag(studytime_sim)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## a 1 1
## b 1 1
## s 1 1
##
## Multivariate psrf
##
## 1
gelman.plot(studytime_sim[,1:3])
autocorr.plot(studytime_sim[[1]][,1:3])
Since the median converges to 1 quite quickly, this suggests the chains have converged to the stationary distribution
studytime_chains <- data.frame(studytime_sim[[1]])
quantile(studytime_chains$b, probs = c(0.025,0.975))
## 2.5% 97.5%
## 2.020730 4.248339
# Summarise the posterior Markov chains
summary(studytime_sim)[1]
## $statistics
## Mean SD Naive SE Time-series SE
## a 50.47253 1.279909 0.014309822 0.015646095
## b 3.16150 0.576989 0.006450933 0.007142316
## s 19.10545 0.421432 0.004711753 0.004714921
# Summarise the posterior Markov chains
summary(studytime_sim)[2]
## $quantiles
## 2.5% 25% 50% 75% 97.5%
## a 47.974069 49.598762 50.47058 51.339959 52.982886
## b 2.029859 2.762809 3.16667 3.551107 4.291609
## s 18.311864 18.822344 19.08858 19.383174 19.959805
ggplot(studytime_chains, aes(x = b)) + geom_density() +
geom_vline(xintercept = quantile(studytime_chains$b, probs = c(0.025,0.975)), color = 'red')
Model_studytime <- dic.samples(model = studytime_jags ,n.iter=1000)
There is a positive association between time studied and grade achieved For every 1 unit of increase in Study time, End of semester grade (G3) increases by 3.165 The 95% credible interval for End of semester grade (G3) is from 2.01 to 4.28 change per unit increase in study time.
##grades per day missed
absence_model <- "model{
# Likelihood model for Y[i]
for(i in 1:length(Y)) {
m[i] <- a + b * X[i]
Y[i] ~ dnorm(m[i], s^(-2))
}
# Define the a, b, s priors
a ~ dnorm(50, 25^(-2))
b ~ dnorm(1, 1^(-2))
s ~ dunif(0, 30)}"
absence_jags <- jags.model(textConnection(absence_model),
data = list(Y = studentdata$G3,
X = studentdata$absences),
n.chains = 4)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1044
## Unobserved stochastic nodes: 3
## Total graph size: 2171
##
## Initializing model
# BURN IN the model
absence_sim <- update(absence_jags, n.iter = 10000)
# SIMULATE the posterior
absence_sim <- coda.samples(model=absence_jags,
variable.names=c("a","b","s"),
n.iter=20000,
thin = 10)
plot(absence_sim[[1]][,1], main = 'a')
plot(absence_sim[[1]][,2], main = 'b')
plot(absence_sim[[1]][,3], main = 's')
It is desirable to have stability in the trace plot (no patterns), with good mixing (ie.chain traverses the parameter space without getting stuck or missing a section). Here the trace plots have good mixing for both a, b and s, however the stability of both of them isn’t what we are looking for. The density plots are what we expected based on the prior we used. It is desirable to have a high ratio of mean to SE and that is what we have.
gelman.diag(absence_sim)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## a 1 1
## b 1 1
## s 1 1
##
## Multivariate psrf
##
## 1
gelman.plot(absence_sim[,1:3])
autocorr.plot(absence_sim[[1]][,1:3])
Here a and s appear to converge to 1 at around the same rate. This is what we want in this case as it tells us that the chain have converged to the stationary distribution. However s appears to converge at a much slower rate.
There is no correlation between the iterates of \(a\), \(b\) or \(s\)
absence_chains <- data.frame(absence_sim[[1]])
quantile(absence_chains$b, probs = c(0.025,0.975))
## 2.5% 97.5%
## -0.32286096 0.05545316
# Summarise the posterior Markov chains
summary(absence_sim)[1]
## $statistics
## Mean SD Naive SE Time-series SE
## a 57.2927767 0.73836643 0.008255188 0.008551258
## b -0.1306706 0.09636133 0.001077352 0.001074723
## s 19.3388035 0.42376888 0.004737880 0.004738294
# Summarise the posterior Markov chains
summary(absence_sim)[2]
## $quantiles
## 2.5% 25% 50% 75% 97.5%
## a 55.8021483 56.8135128 57.2935763 57.78189146 58.71829577
## b -0.3166962 -0.1972589 -0.1297698 -0.06534027 0.05972369
## s 18.5203958 19.0535638 19.3316646 19.62058911 20.19164178
ggplot(absence_chains, aes(x = b)) + geom_density() +
geom_vline(xintercept = quantile(absence_chains$b, probs = c(0.025,0.975)), color = 'red')
There is an almost completely negative association between absences and grade achieved For every 1 unit of increase in absences, End of semester grade (G3) decreases by 0.13. The 95% credible interval for End of semester grade (G3) is from -0.31 to 0.05 change per unit increase in absences.
#Grades and Time spent out with friends/ extra curricular
goout_model <- "model{
# Likelihood model for Y[i]
for(i in 1:length(Y)) {
m[i] <- beta0 + beta1 * X[i]
Y[i] ~ dnorm(m[i], s^(-2))
}
# Define the a, b, s priors
beta0 ~ dnorm(50, 30^(-2))
beta1 ~ dnorm(1, 2^(-2))
s ~ dunif(0, 30)}"
goout_jags <- jags.model(textConnection(goout_model),
data = list(Y = studentdata$G3,
X = studentdata$goout),
n.chains = 4)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1044
## Unobserved stochastic nodes: 3
## Total graph size: 2110
##
## Initializing model
# BURN IN the model
goout_sim <- update(goout_jags, n.iter = 10000)
# SIMULATE the posterior
goout_sim <- coda.samples(model=goout_jags,
variable.names=c("beta0","beta1","s"),
n.iter=20000,
thin = 10)
plot(goout_sim[[1]][,1], main = 'beta0')
plot(goout_sim[[1]][,2], main = 'beta1')
plot(goout_sim[[1]][,3], main = 's')
summary(goout_sim)
##
## Iterations = 11010:31000
## Thinning interval = 10
## Number of chains = 4
## Sample size per chain = 2000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta0 61.338 1.6919 0.018916 0.024335
## beta1 -1.469 0.5006 0.005597 0.007375
## s 19.264 0.4216 0.004713 0.004635
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta0 58.061 60.207 61.340 62.474 64.6327
## beta1 -2.442 -1.808 -1.472 -1.136 -0.4951
## s 18.444 18.979 19.259 19.542 20.0929
It is desirable to have stability in the trace plot (no patterns), with good mixing (ie.chain traverses the parameter space without getting stuck or missing a section). Here the trace plots have good mixing for both beta0, beta1 and s. The density plots are what we expected based on the prior we used. It is desirable to have a high ratio of mean to SE and that is what we have.
For every 1 unit of increase in time spent out with friends, End of semester grade (G3) decreases by 1.45. 95% credible interval tells us that the drop can be from -2.45 to -0.45 per unit increase in time spent out with friends.
gelman.diag(goout_sim)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## beta0 1 1
## beta1 1 1
## s 1 1
##
## Multivariate psrf
##
## 1
gelman.plot(goout_sim[,1:3])
autocorr.plot(goout_sim[[1]][,1:3])
Here beta0 and beta1 appear to converge to 1 at around the same rate. This is what we want in this case as it tells us that the chain have converged to the stationary distribution. We can see there is no correlation for S. There is no correlation between the iterates of \(Beta 0\), \(Beta 1\) or \(s\)
###Bayesian Multiple Regression
# DEFINE the model
grade_model_mult <- "model{
# Define model for data Y[i]
for(i in 1:length(Y)) {
Y[i] ~ dnorm(m[i], s^(-2))
m[i] <- beta0 + beta1*X1[i] + beta2*X2[i]
}
# Define the a, b, c, d and s priors
beta0 ~ dnorm(0, 50^(-2))
beta1 ~ dnorm(3, 2.5^(-2)) #studytime
beta2 ~ dnorm(1, 2^(-2)) #absences
s ~ dunif(0, 20)
}"
# COMPILE the model
grades_jags_mult <- jags.model(textConnection(grade_model_mult),
data = list(Y = studentdata$G3,
X1 = studentdata$studytime, X2 = studentdata$absences), n.chains = 4)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1044
## Unobserved stochastic nodes: 4
## Total graph size: 3267
##
## Initializing model
# BURN IN the model
grades_sim_mult <- update(grades_jags_mult,n.iter=10000)
# SIMULATE the posterior
grades_sim_mult <- coda.samples(model=grades_jags_mult,
variable.names=c("beta0","beta1","beta2","s"),
n.iter=20000)
plot(grades_sim_mult[[1]][,1], density = TRUE, main = "beta0")
plot(grades_sim_mult[[1]][,2], density = TRUE, main = "beta1")
plot(grades_sim_mult[[1]][,3], density = TRUE, main = "beta2")
plot(grades_sim_mult[[1]][,4], density = TRUE, main = "s")
It is desirable to have stability in the trace plot (no patterns), with good mixing (ie.chain traverses the parameter space without getting stuck or missing a section). Here the trace plots have good mixing for beta0, beta1, beta2 and s. Whereas the standard deviation density follows roughly the same uniform distribution as before. It is desirable to have a high ratio of mean to SE and that is what we have.
summary(grades_sim_mult)[1]
## $statistics
## Mean SD Naive SE Time-series SE
## beta0 49.9627911 1.54179615 0.0054510726 0.0188979957
## beta1 3.6510513 0.67866328 0.0023994370 0.0081416109
## beta2 -0.1021425 0.09537237 0.0003371922 0.0004933146
## s 19.0758405 0.39625571 0.0014009755 0.0017975336
summary(grades_sim_mult)[2]
## $quantiles
## 2.5% 25% 50% 75% 97.5%
## beta0 46.9245199 48.9236295 49.9700066 50.99635514 52.99035153
## beta1 2.3310877 3.1947667 3.6481523 4.10481551 4.98494983
## beta2 -0.2896561 -0.1662026 -0.1022943 -0.03833005 0.08534397
## s 18.2929241 18.8019084 19.0795138 19.35719913 19.82838479
Model_1 <- dic.samples(model = grades_jags_mult ,n.iter=1000)
For every 1 unit of increase in studytime, End of semester grade (G3) increases by 3.66. For every 1 unit of increase in absences, End of semester grade (G3) decreases by 0.1.
#Studytime, Absences, time spent out with friends
# DEFINE the model
grade_model_mult2 <- "model{
# Define model for data Y[i]
for(i in 1:length(Y)) {
Y[i] ~ dnorm(m[i], s^(-2))
m[i] <- beta0 + beta1*X1[i] + beta2*X2[i] + beta3*X3[i]
}
# Define the a, b, c, d and s priors
beta0 ~ dnorm(0, 50^(-2))
beta1 ~ dnorm(3, 2.5^(-2)) #studytime
beta2 ~ dnorm(1, 2^(-2)) #absences
beta3 ~ dnorm(-1, 0.26^(-2)) #freetime (after school)
s ~ dunif(0, 20)
}"
Again, It is desirable to have stability in the trace plot (no patterns), with good mixing (ie.chain traverses the parameter space without getting stuck or missing a section). Here the trace plots have good mixing for beta0, beta1, beta2, beta3, and s. Whereas the standard deviation density follows roughly the same uniform distribution as before.
# COMPILE the model
grades_jags_mult2 <- jags.model(textConnection(grade_model_mult2),
data = list(Y = studentdata$G3,
X1 = studentdata$studytime, X2 = studentdata$absences, X3 = studentdata$freetime), n.chains = 4)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1044
## Unobserved stochastic nodes: 5
## Total graph size: 4449
##
## Initializing model
# BURN IN the model
grades_sim_mult2 <- update(grades_jags_mult2,n.iter=10000)
# SIMULATE the posterior
grades_sim_mult2 <- coda.samples(model=grades_jags_mult2,
variable.names=c("beta0","beta1","beta2","beta3","s"),
n.iter=20000)
plot(grades_sim_mult2[[1]][,1], density = TRUE, main = "beta0")
plot(grades_sim_mult2[[1]][,2], density = TRUE, main = "beta1")
plot(grades_sim_mult2[[1]][,3], density = TRUE, main = "beta2")
plot(grades_sim_mult2[[1]][,4], density = TRUE, main = "beta3")
plot(grades_sim_mult2[[1]][,5], density = TRUE, main = "s")
summary(grades_sim_mult2)[1]
## $statistics
## Mean SD Naive SE Time-series SE
## beta0 53.3794097 1.75783967 0.0062149017 0.0231135684
## beta1 3.5436076 0.68384340 0.0024177515 0.0082230925
## beta2 -0.1083690 0.09550371 0.0003376566 0.0005078214
## beta3 -0.9924095 0.23712176 0.0008383520 0.0017622177
## s 19.0562785 0.39843221 0.0014086706 0.0017932980
summary(grades_sim_mult2)[2]
## $quantiles
## 2.5% 25% 50% 75% 97.5%
## beta0 49.927748 52.1885446 53.3832743 54.57213918 56.79490292
## beta1 2.196014 3.0821263 3.5450330 4.01061867 4.87299252
## beta2 -0.295884 -0.1729357 -0.1081799 -0.04420572 0.07847644
## beta3 -1.456886 -1.1525432 -0.9925750 -0.83276303 -0.52745449
## s 18.270806 18.7815943 19.0564248 19.33541976 19.81848355
# Calculate the deviance and pD
Model_2 <- dic.samples(model = grades_jags_mult2,n.iter=1000)
For every 1 unit of increase in studytime, End of semester grade (G3) increases by 3.54. For every 1 unit of increase in absences, End of semester grade (G3) decreases by 0.11. For every 1 unit increase in freetime, End of semester grade (G3) decreases by 0.99.
Model_studytime
## Mean deviance: 9121
## penalty 2.726
## Penalized deviance: 9123
Model_1
## Mean deviance: 9120
## penalty 3.953
## Penalized deviance: 9124
Model_2
## Mean deviance: 9117
## penalty 3.961
## Penalized deviance: 9121
The penalized deviance is lower for model_studytime and model_1, suggesting a better fit The penalty is lower for model 1, suggesting a simpler model Combining these, The DIC is lower for model_2 so It is the preferred model.
###Conclusion Of the three models we have run; Model_studytime: G3 ~ Studytime Model_1: G3 ~ Studytime + Absences Model_2: G3 ~ Studytime + Absences + Freetime We have found that Model_2 is the optimal model.
Our aim was to create a model that can accurately predict end of year grades based on student information. Although we haven’t tried models including every combination of variables from our dataset, we have tried three different models and shown the preferred model by using DIC. Where our preferred model is, G3 ~ Studytime + Absences + Freetime. Given more time, We would like to split the dataset in two, so that we could use one part to train our model and the other to test it. This would allow us to try and build a model to accurately predict grades for leaving cert students.